AITopics | noisy feature

Collaborating Authors

noisy feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ADebiasedMDIFeatureImportanceMeasurefor RandomForests

Neural Information Processing SystemsFeb-12-2026, 13:47:45 GMT

In particular, interpreting Random Forests (RFs) [2] and its variants [14, 28, 27, 29, 1, 12] has become an important area of research due to the wide ranging applications of RFs invarious scientific areas, such asgenome-wide association studies (GWAS)[7],gene expression microarray[13,23],andgeneregulatorynetworks[9].

artificial intelligence, machine learning, mdi-oob, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.35)

Add feedback

702cafa3bb4c9c86e4a3b6834b45aedd-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 13:47:31 GMT

mdi-oob, oob sample, reviewer 2, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Modeling & Simulation (0.36)
Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

Self-explainingdeepmodelswithlogicrulereasoning

Neural Information Processing SystemsFeb-7-2026, 14:48:29 GMT

Wethenillustrate howtoenable adeepmodel to predict and explain with logic rules.

artificial intelligence, explanation, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Debiased MDI Feature Importance Measure for Random Forests

Neural Information Processing SystemsDec-25-2025, 13:28:24 GMT

Tree ensembles such as Random Forests have achieved impressive empirical success across a wide variety of applications. To understand how these models make predictions, people routinely turn to feature importance measures calculated from tree ensembles. It has long been known that Mean Decrease Impurity (MDI), one of the most widely used measures of feature importance, incorrectly assigns high importance to noisy features, leading to systematic bias in feature selection. In this paper, we address the feature selection bias of MDI from both theoretical and methodological perspectives. Based on the original definition of MDI by Breiman et al. \cite{Breiman1984} for a single tree, we derive a tight non-asymptotic bound on the expected bias of MDI importance of noisy features, showing that deep trees have higher (expected) feature selection bias than shallow ones. However, it is not clear how to reduce the bias of MDI using its existing analytical expression. We derive a new analytical expression for MDI, and based on this new expression, we are able to propose a debiased MDI feature importance measure using out-of-bag samples, called MDI-oob. For both the simulated data and a genomic ChIP dataset, MDI-oob achieves state-of-the-art performance in feature selection from Random Forests for both deep and shallow trees.

debiased mdi feature importance measure, name change, random forest, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

702cafa3bb4c9c86e4a3b6834b45aedd-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 23:38:27 GMT

artificial intelligence, decision tree learning, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)

Add feedback

comments, we organize our responses as follows

Neural Information Processing SystemsOct-2-2025, 23:38:13 GMT

We thank the reviewers for their valuable feedback that will significantly improve our paper. This is indeed a limitation of Theorem 1. The CHIP data included in our simulation studies shows that MDI-oob works in this setting. We plan to add this plot in our supplementary material. Reviewers 2 and 3: Give theoretical/empirical evidence that MDI-oob can "debias" MDI. Empirically, we compute the MDI-oob for the first simulation.

artificial intelligence, machine learning, oob sample, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Modeling & Simulation (0.36)
Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

Matrix Completion with Noisy Side Information

Kai-Yang Chiang, Cho-Jui Hsieh, Inderjit S. Dhillon

Neural Information Processing SystemsOct-1-2025, 23:51:04 GMT

We study the matrix completion problem with side informatio n. Side information has been considered in several matrix completion applicati ons, and has been empirically shown to be useful in many cases. Recently, resear chers studied the effect of side information for matrix completion from a theoretica lv i e w p o i n t,s h o w i n g that sample complexity can be significantly reduced given co mpletely clean features. However, since in reality most given features are noi sy or only weakly informative, the development of a model to handle a general feature set, and investigation of how much noisy features can help matrix recovery, r emains an important issue. In this paper, we propose a novel model that balances b etween features and observations simultaneously in order to leverage feature i nformation yet be robust to feature noise. Moreover, we study the effect of general fe atures in theory and show that by using our model, the sample complexity can be low er than matrix completion as long as features are sufficiently informative .T h i s r e s u l t p r o v i d e s at h e o r e t i c a li n s i g h ti n t ot h eu s e f u l n e s so fg e n e r a ls i d ei n formation. Finally, we consider synthetic data and two applications -- relationshi pp r e d i c t i o na n ds e m i - supervised clustering -- and show that our model outperforms other methods for matrix completion that use features both in theory and pract ice.

complexity, information, matrix completion, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Exploring Content and Social Connections of Fake News with Explainable Text and Graph Learning

Lourenço, Vítor N., Paes, Aline, Weyde, Tillman

arXiv.org Artificial IntelligenceAug-20-2025

The global spread of misinformation and concerns about content trustworthiness have driven the development of automated fact-checking systems. Since false information often exploits social media dynamics such as "likes" and user networks to amplify its reach, effective solutions must go beyond content analysis to incorporate these factors. Moreover, simply labelling content as false can be ineffective or even reinforce biases such as automation and confirmation bias. This paper proposes an explainable framework that combines content, social media, and graph-based features to enhance fact-checking. It integrates a misinformation classifier with explainability techniques to deliver complete and interpretable insights supporting classification decisions. Experiments demonstrate that multimodal information improves performance over single modalities, with evaluations conducted on datasets in English, Spanish, and Portuguese. Additionally, the framework's explanations were assessed for interpretability, trustworthiness, and robustness with a novel protocol, showing that it effectively generates human-understandable justifications for its predictions. The code and experiments are available at https://github.com/MeLLL-UFF/mu2X/ .

explanation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.1004

Country: South America > Brazil > Rio de Janeiro (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Closer Look at Multimodal Representation Collapse

Chaudhuri, Abhra, Dutta, Anjan, Bui, Tu, Georgescu, Serban

arXiv.org Artificial IntelligenceAug-18-2025

We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that modality collapse happens when noisy features from one modality are entangled, via a shared set of neurons in the fusion head, with predictive features from another, effectively masking out positive contributions from the predictive features of the former modality and leading to its collapse. We further prove that cross-modal knowledge distillation implicitly disentangles such representations by freeing up rank bottlenecks in the student encoder, denoising the fusion-head outputs without negatively impacting the predictive features from either modality. Based on the above findings, we propose an algorithm that prevents modality collapse through explicit basis reallocation, with applications in dealing with missing modalities. Extensive experiments on multiple multimodal benchmarks validate our theoretical claims. Project page: https://abhrac.github.io/mmcollapse/.

artificial intelligence, machine learning, modality, (16 more...)

arXiv.org Artificial Intelligence

2505.22483

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving Labs

Shi, Qiuyu, Li, Kangming, Fehlis, Yao, Persaud, Daniel, Black, Robert, Hattrick-Simpers, Jason

arXiv.org Artificial IntelligenceJul-24-2025

Self-driving laboratories (SDLs) have shown promise to accelerate materials discovery by integrating machine learning with automated experimental platforms. However, errors in the capture of input parameters may corrupt the features used to model system performance, compromising current and future campaigns. This study develops an automated workflow to systematically detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values. A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features. In general, high-intensity noise and large training datasets are conducive to the detection and correction of noisy features. Low-intensity noise reduces detection and recovery but can be compensated for by larger clean training data sets. Detection and correction results vary between features with continuous and dispersed feature distributions showing greater recoverability compared to features with discrete or narrow distributions. This systematic study not only demonstrates a model agnostic framework for rational data recovery in the presence of noise, limited data, and differing feature distributions but also provides a tangible benchmark of kNN imputation in materials data sets. Ultimately, it aims to enhance data quality and experimental precision in automated materials discovery.

artificial intelligence, machine learning, noisy feature, (15 more...)

arXiv.org Artificial Intelligence

2507.16833

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.63)

Add feedback